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Abstract 

In this paper we extend the Shiryaev's quickest change detection formulation by also accounting 
for the cost of observations used before the change point. The observation cost is captured through the 
average number of observations used in the detection process before the change occurs. The objective is to 
select an on-off observation control policy, that decides whether or not to take a given observation, along 
with the stopping time at which the change is declared, so as to minimize the average detection delay, 
subject to constraints on both the probability of false alarm and the observation cost. By considering a 
Lagrangian relaxation of the constraint problem, and using dynamic programming arguments, we obtain 
an a posteriori probability based two-threshold algorithm that is a generalized version of the classical 
Shiryaev algorithm. We provide an asymptotic analysis of the two-threshold algorithm and show that the 
algorithm is asymptotically optimal, i.e., the performance of the two-threshold algorithm approaches that 
of the Shiryaev algorithm, for a fixed observation cost, as the probability of false alarm goes to zero. We 
also show, using simulations, that the two-threshold algorithm has good observation cost-delay trade-off 
curves, and provides significant reduction in observation cost as compared to the naive approach of 
fractional sampling, where samples are skipped randomly. Our analysis reveals that, for practical choices 
of constraints, the two thresholds can be set independent of each other: one based on the constraint of 
false alarm and another based on the observation cost constraint alone. 
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I. Introduction 

In the Bayesian quickest change detection problem proposed by Shiryaev fT], there is a sequence of 
random variables, {Xn}, whose distribution changes at a random time T. It is assumed that before T, 
{Xn} are independent and identically distributed (i.i.d.) with density /o, and after F they are i.i.d. with 
density fi. The distribution of T is assumed to be known and modeled as a geometric random variable 
with parameter p. The objective is to find a stopping time r, at which time the change is declared, such 
that the average detection delay is minimized subject to a constraint on the probability of false alarm. 

In this paper we extend Shiryaev's formulation by explicitly accounting for the cost of the observations 
used in the detection process. We capture the observation penalty (cost) through the average number of 
observations used before the change point F, and allow for a dynamic control policy that determines 
whether or not a given observation is taken. The objective is to choose the observation control policy 
along with the stopping time r, so that the average detection delay is minimized subject to constraints 
on the probability of false alarm and the observation cost. The motivation for this model comes from the 
consideration of the following engineering applications. 

In many monitoring applications, for example infrastructure monitoring, environment monitoring, or 
habitat monitoring, especially of endangered species, surveillance is only possible through the use of 
inexpensive battery operated sensor nodes. This could be due to the high cost of employing a wired 
sensor network or a human observer, or the infeasibility of having a human intervention. For example in 
habitat monitoring of certain sea-birds as reported in |0, the very reason the birds chose the habitat was 
because of the absence of humans and predators around it. In these applications the sensors are typically 
deployed for long durations, possibility over months, and due the constraint on energy, the most effective 
way to save energy at the sensors is to switch the sensor between on and off states. An energy-efficient 
quickest change detection algorithm can be employed here that can operate over months and trigger 
other more sophisticated and costly sensors, which are possibly power hungry, or more generally, trigger 
a larger part of the sensor network ifTOl . This change could be a fault in the structures in infrastructure 
monitoring lITOl . the arrival of the species to the habitat S, etc. 

In industrial quality control, statistical control charts are designed that can detect a sustained deviation 
of the industrial process from normal behavior [0. Often there is a cost associated with acquiring the 
statistics for the control charts and it is of interest to consider designing economic-statistical control chart 
schemes ||2l, [Bj, lH, [Sl, 161, Q, lU. One approach to economic-statistical control chart design has been 
to use algorithms from the change detection literature, such as Shewhart, EWMA and CUSUM, as control 
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charts, and optimize over the choice of sample size, sampUng interval and control limits |l3], ||4]. Another 
approach has been to find optimal sampling rates in the problem of detection of a change in the drift 
of a sequence of Brownian motions with global false alarm constraint ||5], ||6]. Thus, these approaches 
are essentially non-Bayesian. It has been demonstrated, mostly through numerical results, that Bayesian 
control charts, which choose the parameters of the detection algorithms based on the posterior probability 
that the process is out of control, perform better than the traditional control charts based on Shewhart, 
EWMA or CUSUM; see Q, and the references therein. The problem of dynamic sampling for detecting 
a change in the drift of a standard Brownian motion is considered for an exponentially distributed change 
point in HI. For practical applications, it is of interest to consider the economic design of Bayesian 
control charts in discrete time. The design of a Bayesian economic-statistical control chart is considered 
for a shift in the mean vector of a multivariate Gaussian model in Q. But, the problem is modeled as 
an optimal stopping problem that minimizes the long term average cost, and hence, there is no control 
on the number of observations used at each time step. The process control problem is fundamentally a 
quickest change detection problem, and it is therefore appropriate that economic-statistical schemes for 
process control are developed in this framework. 

In most of the above mentioned or similar applications, changes are rare and quick detection is often 
required. So, ideally we would like to take as few observations as possible before change to reduce 
the observation cost, and skip as few as possible after change to minimize delay, while maintaining an 
acceptable probability of false alarm. 

There have been other formulations of the Bayesian quickest change detection problem that are relevant 
to sensor networks: see lITTI - llTSl . The change detection problem studied here was earlier considered in 
a similar set-up for sensor networks in ||T61 . But owing to the complexity of the problem, the structure 
of the optimal policy was studied only numerically, and for the same reason, no analytical expressions 
were developed for the performance. 

The goal of this paper is to develop a deeper understanding of the trade-off between delay, false 
alarm probability, and the cost of observation or information, and to identify a control policy for data- 
efficient quickest change detection that has some optimality property and is easy to design. We extend 
the Shiryaev's formulation by also accounting for the cost of observations used before the change point, 
and obtain an a posteriori probability based two-threshold algorithm that is asymptotically optimal. 
Specifically, we show that the probability of false alarm and the average detection delay of the two- 
threshold algorithm approaches that of the Shiryaev algorithm, for a fixed observation cost constraint, 
as the probability of false alarm goes to zero. Even for moderate values of the false alarm probability. 
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we will show using simulations that the two-threshold algorithm provides good performance. We also 
provide an asymptotic analysis of the two-threshold algorithm, i.e., we obtain expressions for the delay, 
probability of false alarm and the average number of observations used before and after change, using 
which the thresholds can be set to meet the constraints on probability of false alarm and observation 
cost. 

The layout of the paper is as follows. In the following section, we set up the data-efficient quickest 
change detection problem with on-off observation control and introduce the two-threshold algorithm. 
In Section |llll we provide an asymptotic analysis of the two-threshold algorithm. In Section |IV] we 
provide approximations using which the analytical expressions in Section |lll] can be computed, and 
validate the approximations by comparing them with the corresponding values obtained via simulations. 
In Section |Vl we prove the asymptotic optimality of the two-threshold algorithm, provide its false alarm- 
delay-observation cost trade-off curves and also compare its performance with the naive approach of 
fractional sampling, where observations are skipped randomly. 

II. Problem Formulation and the Two-threshold Algorithm 

As in the model for the classical Bayesian quickest change detection problem described in Section H 
we have a sequence of random variables which are i.i.d. with density /o before the random change 

point r, and i.i.d. with density /i after T. The change point F is modeled as geometric with parameter 
p, i.e., for < p < 1, < ttq < 1, 

TTk = P{r = k} = 7To I|fe=o} + (1 - ^0)P(1 - P)''-^ I{fc>l}, 

where I is the indicator function, and ttq represents the probability of the change having happened before 
the observations are taken. Typically ttq is set to 0. 

In order to minimize the average number of observations used before F, at each time instant, a decision 
is made on whether to use the observation in the next time step, based on all the available information. 
Let Sk € {0, 1}, with = 1 if it is been decided to take the observation at time k, i.e. Xf^ is available 
for decision making, and Sk = otherwise. Thus, Sk is an on-off (binary) control input based on the 
information available up to time k — I, i.e., 

5fc = /ifc_i(4_i), k=l,2,... 

with fi denoting the control law and / defined as: 

J-k — "Jl) • • • J 'Jfc) ^1 )---)^fc 
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Here, represents Xi if Si = 1, otherwise Xi is absent from the information vector Ik- The choice 

of Si is based on the prior ttq. 

As in the classical change detection problem, the end goal is to choose a stopping time on the 
observation sequence at which time the change is declared. Denoting the stopping time by r, we can 
define the average detection delay (ADD) as 

ADD = E [(r - r)+] . 

Further, we can define the probability of false alarm (PFA) as 

PFA = P(t < r). 

The new performance metric for our problem is the average number of observations (ANO) used before 
r in detecting the change: 



ANO = E 



min{T,r-l} 
k=l 



Let J = {t, fiQ, . . . , /ir-i} represent a policy for cost-efficient quickest change detection. We wish to 
solve the following optimization problem: 

minimize ADD (7), 

7 

subject to PFA(7) < a, and AN0(7) < /3, (1) 

where a and /3 are given constraints. Towards solving ([T), we consider a Lagrangian relaxation of this 
problem which can be approached using dynamic programming: 

J* = min ADD(7) + A/ PFA(7) + Ae AN0(7), (2) 

7 

where A/ and Ag are Lagrange multipliers. It is easy to see that if A/ and Ag can be found such that the 
solution to ^ achieves the PFA and ANO constraints with equality, then the solution to Q is also the 
solution to ([T]). 

The problem in can be converted to an appropriate Markov control problem using steps similar to 
those followed in |[T6l . 

Let 0fc denote the state of the system at time k. After the stopping time r it is assumed that the system 
enters a terminal state T and stays there. For A; < r, we have 0fe = for A; < T, and 0fe = 1 otherwise. 
Then we can write 

ADD = E j;i{e.=i} 

,/c=0 
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and PFA = E[I{e.=o}]- 

Furthemiore, let Dk denote the stopping decision variable at time k, i.e., Dk = if k < t and = 1 
otherwise. Then the optimization problem in (|2j can be written as a minimization of an additive cost 
over time: 

J* = miiiE 'y^Qk{&k,Dk,Sk) 
with 

gk{9,d,s) = I{e^r} [l{6i=i}I{d=o} + ^/ '^{e=o}\d=i} I{6i=o}I{s=i}I{d=o}] ■ 

Using standard arguments lIlTI it can be seen that this optimization problem can be solved using infinite 
horizon dynamic programming with sufficient statistic (belief state) given by: 

Pk = P{@k = 1 1 4} = P{r < k j 4}. 

Using Bayes' rule, pk can be shown to satisfy the recursion 

ifSk+i = 
^cI>W(Xfc+i,pfc) ifSk+i = l 
where 

'^^''\pk)=Pk + il-Pk)p (3) 

and 



(4) 



with L{Xk+i) = fi{Xk+i)/fo{Xk+i) being the likelihood ratio, and po = ttq. Note that the structure of 
recursion for pk is independent of time k. 

The optimal policy for the problem given in Q can be obtained from the solution to the Bellman 
equation: 

J{pk) = mill Xf (1 - Pk)I{d^=i} + I{dfc=o} \Pk + Aj{Pk)] , (5) 



where 



with 



and 



Aj{pk) = Bo{pk)l{s^+^=0} + (Ae(l -Pfc) + Bi{pk))I{s,+,=l}, 

BoiPk) = Ji^^'Hpk)) 



Bi{pk) = E[J{<^^^\Xk+i,Pk))]. 
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It can be shown by an induction argument (see, e.g., |[T6l ) that J, Bq and Bi are all non-negative concave 
functions on the interval [0, 1], and that J(l) = Bo{l) = Bi{l) = 0. Also, by Jensen's inequality 

Blip) < J(E[cl.(i)(X,p)]) = Boip), p G [0, 1]. 



Let 



d{pk) = Bo{pk) - Bi{pk) 



Then, from the above properties of J, Bq and Bi, it is easy to show that the optimal policy 7* 
{t*, fiQ, fi*, . . . , fi*_i) for the problem given in Q has the following structure: 



S, 



k+l 



(6) 



if d(pfc) < Ae(l -Pfc) 

1 if d{pk) > Xeil - Pk) 
T* =mi{k>l:pk> A*}. 

Remark 1. Since, d{pj.) > Vp^, the algorithm in ^ reduces to the classical Shiryaev algorithm when 



A. 



The optimal stopping rule r* is similar to the one of the Shiryaev problem. But, the observation control 
is not explicit and one has to evaluate the differential cost function d{pi.) at p^ at each time step to choose 

'S'fc+i- 




(a) d{p) — Bo{p) — Bi (p) and Ae(l — p) as a function of p (b) p + Aj{p) and A/(l — p) as a function of p 

Fig. 1: Example where a two-threshold policy is optimal: /o ~ 7V^(0, 1), fi A/'(0.75,l), p = 0.05, 
\f = 50, and Ag = 0.5. Value iteration: number of iterations=1500, number of points=2000. 



January 20, 2013 



DRAFT 



8 



In Fig. [Ta] we plot the differential cost function d{p) = Bq{p) — Bi{p) and Ae(l — p) as a function 
of p. We note that, although Bq{p) and Bi{p) are concave in p, their difference d{p) is not. Thus, the 
line Ae(l — p) can intersect d{p) at more than two points. However, in Fig. [Ta] we see that there are 
exactly two points of intersection, one at i? = 0.306 an another at C = 0.96. In Fig. [lb] we plot the 
functions p + Aj{p) and Aj(l — p) as a function of p. This figure shows that the stopping threshold is 
A = 0.8815 < 0.96 = C. Thus, from Fig [Ta] and [Tb] we see that the optimal policy has two thresholds. 
For most of the system parameters we have tried, the cost functions behave in this way, and hence for 
these values, the following two-threshold policy is optimal. 

Algorithm 1 (Two-threshold policy: ^{A, B)). Start with po = and use the following control, with 
B <A,fork> 0.- 



ifpk<B 

1 ifPk>B (7) 



r = inf {A; > 1 : pk > A} . 
The probability pj. is updated using ([3]) and (]4]). 

Extensive numerical studies of the Bellman equation ([5]) also shows that there exists choices of p, /o, 
fi, Xf and Ae for which ([7]) is not optimal. In Fig. [2] we plot one such case. Note from Fig. l2a]that again 
there are two points of intersection of the plotted curves, one at B = 0.9315 and another at C = 0.973. 
But Fig. [2b] shows that A = 0.986 > 0.973 = C. Thus, the optimal policy has three thresholds. But, 
note that the value of p = 0.7 is quite large and hence impractical. Also, simulations with these choices 
of thresholds show that the ANO is approximately zero. In all the cases we have found, for which the 
two-threshold policy is not optimal, the value of p is large and ANO is almost zero. 

From a practical point of view, even if a two-threshold policy or algorithm ([7]i is not optimal, one 
would like to use the algorithm for the following reasons. First, as the asymptotic analysis given in 
Section [HI] will reveal, if the PFA constraint is moderate to small and the ANO constraint is not very 
severe, then the thresholds A and B in 'y{A, B) can be set independently: the threshold A can be set only 
based on the constraint a, and the threshold B can be set based on the constraint /? alone. Second, apart 
from being simple, the two-threshold algorithm (]7) is asymptotically optimal as the PFA 0. Finally, 
'^{A,B) has good trade-off curves, i.e., the ANO of 'y{A,B) can be reduced by up to 70%, by keeping 
the ADD of the. -f{A,B) within 10% of the ADD of the Shiryaev algorithm. 

It is interesting to note that a two-threshold algorithm similar to that in ([7]i was shown to be exactly 
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(a) d{p) — Bo{p) — Bi (p) and Ae(l — p) as a function of p (b) p + Aj{p) and A/(l — p) as a function of p 

Fig. 2: Example where a two-threshold policy is not optimal: /o ~ A/'(0, 1), /i ~ A/^(l, 1), p = 0.7, 
A/ = 100 and Ag = 5. Value iteration: number of iterations=1500, number of points=2000. 



optimal in ifTTl for a different but related problem of quality control where inspection costs are considered 
or when the tests are destructive. 

III. Asymptotic ANALYSIS OF 7(A, 5) 

In this section we derive asymptotic approximations for ADD, PFA and ANO for the two-threshold 
algorithm 'y{A, B). To that end, we first convert the recursion for (see ([3]l and dUl) to a form that is 
amenable to asymptotic analysis. 

Define, = log for A; > 0. This new variable has a one-to-one mapping with p^. By defining 

a = log 7 , = ice 



we can write the recursions 1^ and (H) in terms of Z^. 
For k > I, 

Zfc+i = Zfc + logL(Xfc+i) + I log(l -p)\ + log (1 + pe-^-) , ifZk£ [b, a) (8) 

and 

Zk+i = Zk + \ log(l - p) I + log (1 + /9 e-^") , if Zk i [6, a) (9) 

with 

Zi = log (e^° + p) + I log(l - p)| + log (L(Xi)) I{Zoe[fe,a)}- 
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Here we have used the fact that 5^+1 = 1 if pfc G [B, A), and Sk+i = otherwise (see The crossing 
of thresholds A and B by pk is equivalent to the crossing of thresholds a and b by Z^. Thus the stopping 
time for 7(^4, B) (equivalently 7(0, b) with some abuse of notation) is 

r = inf {k > 1 : Zk > a} . 

In this section we study the asymptotic behavior of j{a,b) in terms of Z^, under various limits of a,b 
and p. Specifically, we provide two asymptotic expressions for ADD, one for fixed thresholds a, b, as 
p — )■ 0, and another for fixed b and p, as a — )• 00. We also provide, as a — > 00 and p — )• 0, an asymptotic 
expression for PFA for fixed b. Finally, we also provide asymptotic estimates of the average number of 
observations used before (ANO) and after the change point T. Note that the limit of a — )■ 00 corresponds 
to PFA 0. 

Fig. [3] shows a typical evolution of 7(0, b), i.e., of Zk using ^ and starting at time 0. Note that 
for Zk € [b, a), recursion dSJ is employed, while outside that interval, recursion Q, which only uses the 
prior p, is employed. As a result Zk increases monotonically outside [b,a). 




^0 20 40 r 60 70 X 

Fig. 3: Evolution of Zj, for /o ~ M{0, 1), /i ~ 7V(0.5, 1), and p = 0.01, with thresholds a = 3.89, and 
b = —1.38, corresponding to the pk thresholds A = 0.98 and B = 0.2, respectively. Also Zq = b. 

From Fig. |3] again, each time Z^ crosses b from below, it can either increase to a (point r), or it can 
go below b and approach b monotonically from below, at which time it faces a similar set of alternatives. 
Thus the passage to threshold a possibly involves multiple cycles of the evolution of Z^ below b. We 
will show in Section IIII-CI that after the change point F, following a finite number of cycles below b, 
Zk grows up to cross a, and the time spent on the cycles below b is insignificant as compared to r — F, 
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as a ^ oo. In fact we show that, asymptotically, the time to reach a is equal to the time taken by the 
classical Shiryaev algorithm to reach a. (Note that for the classical Shiryaev algorithm the evolution of 
Zf^ would be based on ([8]l). 

When Zk crosses a from below, it does so with an overshoot. Overshoots play a significant role in 
the performance of many sequential algorithms (see ||T8l . ||20l ) and they are central to the performance 
of 7(0, b) as well. In Section IIII-B[ we show that PFA depends on the threshold a and the overshoot 
{Zr — a) as a ^ 00, but is not a function of the threshold b. 

The number of observations taken during the detection process is the total time spent by Zj^ between 
h and a. As a — )■ 00, crosses a only after change point F, with high probability. The total number 
of observations taken can thus be divided in to two parts: the part taken before F (ANO), which is the 
fraction of time Zf^ is above b (and hence depends only on b), and the part taken after F. In Section ITlI-DI 
we show that, asymptotically, the average number of observations taken after F is approximately equal 
to the delay itself. 

In Section ITVl we provide approximations using which the asymptotic expressions can be computed and 
provide numerical results to demonstrate that under various scenarios, for limiting as well as moderate 
values of a, b, and p, our asymptotic expressions for ADD, PFA and ANO provide good approximations. 
In Section |V] we use the asymptotic expressions for ADD and PFA to show asymptotic optimality of 

7(0,6). 

We begin our analysis by first obtaining the asymptotic overshoot distribution for {Zr — a) using 
nonlinear renewal theory ||T8l . ||T9l . As mentioned above, this will be critical to the PFA analysis. For 
convenience of reference, in Table U we provide a glossary of important terms used in this paper. 

In what follows, we use and to denote, respectively, the expectation and probability measure when 
change happens at time I. We use Eqo and Pqo to denote, respectively, the expectation and probability 
measure when the entire sequence is i.i.d. with density /q. Note that, g{x) = o(l) as x — > xq is 

used to denote that g{x) — > in the specified limit. 

A. Asymptotic overshoot 

In this section we characterize the overshoot distribution of as it crosses a as a —> 00. In analyzing 
the trajectory of Z^,, it useful to allow for arbitrary starting point Zq (shifting the time axis). We first 
combine the recursions in (|8j and © to get: 

Zk+i = Zk + \z,>b} log L{Xk+i) + I log(l - p)| + log (1 + e-^V) • 
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TABLE I: Glossary 



Symbol Definition/Interpretation 


Symbol Deflnition/tnterpretatioii 


ADD Average detection delay 

PFA Probability of false alarm 

ANO Average # observations used before change 

ANOi Average # observations used after change 

{Xf^} Observation sequence 

Pk a posteriori probability of change 

r First time for pi^ to cross A or 

first time for Zj^ to cross a = log ^ 
{Vk} Slowly changing sequence 
R{x), f Asymptotic distribution and mean of overshoot 

when 1 crosses a large threshold 
t(x, y) Time for to reach y starting at x using (5) 
v{x,y) Time for Zj. to reach y starting at x using (8) 

also, time for Shiryaev algorithm to reach y starting at x 
Vfj, uq u{b, a) and ^{—oo, a) 


A Starting at b, first time Z^ is outside [b, a) 
A Starting at b, first time crosses a 

or crosses b from below 
ADD'' Starting at b, time for Z^. to reach a under Pi, when 

is reset to b each time it crosses b from below 
X{x) Starting 3X x > b, first time Z^, is outside [b, a) 
A(x) Starting at x > b, first time Z^. crosses a 

or crosses b from below 
A Starting at b, first time Z^ < b with a = oo 
X{x) Starting at x > b, first time Z^ < b with a = oo 
T(, Time spent by Z^. below b, after F, when r > F 
A^ Stalling at x > b, first time Zj. > a, or crosses 6 from 

below, or is stopped by occurrence of change 
<5^ The fraction of time Zj, is above b, when stopped by A^ 
i't (pi,) Starting at b. time for Zj. to reach a, when Zj. is 

reflected at b (reset to b when it crosses b from below) 



By defining = log L {X^ ) + 1 log {l — p) \ and expanding the above recursion, we can write an expression 
for Z„: 

n n—1 n 

Zr. = Y.Yk + log{e^" + p)+Y,^og{l + e~^''p)-Y,hz.<b}logL{Xk) 

k=l k=l k=l 

n 

= ^Yk + Vn- (10) 

k=l 

Here r]n is used to represent all terms other than the first in the equation above: 

n— 1 n 

rin = log{e^° + p) +^log{l + e-^^p) -J2hz.<b}^ogL{Xk). (11) 

k=l k=l 

As defined in lITSl . ry„ is a slowly changing sequence if 

n~^max{|7?i|, . . . , |77„|} "~'°°> 0, (12) 

i.p. 

and for every e > 0, there exists n* and S > such that for all n > n* 

P{ max \r]n+k - Vnl > e} < e. (13) 

l<fc<n<5 

If indeed {r/„} is a slowly changing sequence, then the distribution of Z,- — a, as a — oo, is equal to 
the asymptotic distribution of the overshoot when the random walk X]fc=i crosses a large positive 
boundary. We have the following result. 
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Theorem 1. Let R{x) be the asymptotic distribution of the overshoot when the random walk X]fc=i 
crosses a large positive boundary under Pi. Then for fixed p and b, under Pi, we have the following: 

1) {rjn} is a slowly changing sequence. 

2) R{x) is the distribution of Zr — a as a oo, i.e., 



lim F[Zr-a< x\t >T]= R{x). 



(14) 



Proof: When b = — oo, evolves as in the classical Shiryaev algorithm statistic, and it is easy to 
see that in this case: 



Vn 



n-1 



log [e^" + p) + J]] log (l + e" 



k=l 



log 



n-1 



k=0 



\h{Xi) 



It was shown in j lOl that this sequence (for b = — oo), with Zq = — oo, is a slowly changing 

sequence. It is easy to show that {r]n} is a slowly changing sequence even if Zq is a random vari- 
able. Also, if Lz is the last time crosses b from below, then note that, after Lz, the last term 
Z]fc=i ^{Zfc<6} log L{Xk) in (ITTI ) vanishes, and r/,„ in (ITTI ) behaves like the for b = — oo. We prove the 
theorem using these observations. The detailed proof is given in the appendix to this section. ■ 

B. PFA Analysis 

We first obtain an expression for PFA as a function of the overshoot when Z^ crosses a. 

Lemma 1. For fixed p and b, 

PFA = E[l -p^] = e"''E[e-(^--'')|r > r](l + o(l)) a ^ oo. 

Proof: See the appendix for the proof. ■ 
From Lemma m it is evident that PFA depends on the overshoot when Zk crosses a as o — )• oo. Since 
the overshoot has an asymptotic distribution (Theorem [l) that depends only on densities /o, /i and prior 
p, and is independent of b, it is natural to expect that as a — )• oo, PFA is completely characterized by 
the asymptotic distribution R{x) and is not a function of the threshold b. This is indeed true and is 
established in the following theorem. 
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Theorem 2. For a fixed b and p, 



PFA(7(a, b)) = (^-^ e-'^dRix)^ (1 + o(l)) as a ^ oo. (15) 
Proof: The proof is provided in the appendix. ■ 

C. Delay Analysis 

The PFA for 7(0, b) have the following bound: 

PFA = E[l - p,] < 1 - A = < e-". (16) 

Using this upper bound we can show that the ADD of 7(0, b) is given by: 

ADD = E [(r - r)+] 

= E[r-r|T >r](l + o(l)) asa^cx). (17) 

In the following we provide two different expressions for E[t — r|r > T]. The first one is obtained by 
keeping b fixed and taking p ^ 0. This expression will be used to get accurate delay estimates for 7(0, b) 
in Section HV] 

Next, we will provide another asymptotic expression for E[t — r|r > F] for a fixed b, p and as a — )■ 00. 
We show that in this limit, E[r — F|r > F] converges to the Shiryaev delay. This fact will be used to 
prove the asymptotic optimality of 7(0, in Section IVl 

It was discussed in reference to Fig. |3]that each time crosses b from below, it faces two alternatives, 
to cross a without ever coming back to b or to go below b and cross it again from below. It was mentioned 
that the passage to the threshold a is through multiple such cycles. Motivated by this we define the 
following stopping times A and A: 

X = mi{k>l:Zk^[b,a),Zo = b}, (18) 



and 



A = inf{A; > I : Zk > a or 3 k s.t. Zk-i < b and Zk > b , Zq = b}. (19) 



Let t{x, y) be the constant time taken by Z^ to move from Zq = x to y using the recursion (|9]l, i.e. 

t{x, y) = inf{fc > : Zfe > y, Zo = X, x, y ^ [6, a)}. (20) 
Then, we can write A as a function of A using (|20l ): 

K = {\ + t{ZxM)hz,<h} + \l{z,>a} = ^ + t{Zx,b)hz,<h}- 
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The significance of these stopping times is as follows. If we start the process at Zq = b and reset Zk to b 
each time it crosses bfrom below, then the time taken by Zk to move from 6 to a is the sum of a finite but 
random number of random variables with distribution of A, say Ai, A2, . . . , A^v- For i = 1, . . . ,N — 1, 



< b, and Z/^ > a. Thus the time to reach a in this case is Ei 



E^.iA. 



. Let 



ADD'' = E 



N 



.k=l 



The behavior of the delay path depends on Zr, the value of at the change point T, and how Z^ 
evolves after that point. We use {Z^ b} to indicate that Z^ approaches b from below for some k > T, 
i.e. 3k > r, s.t., Zk-i < b, Z^ > b. and use {Zk a} to represent the event that Zk crossed a without 
ever coming back to 5, i.e., Zk >b,yk> T. We define the following three disjoint events: 

A = {Zr<b}, 

B = {Zr>b;Zk /^b}, 

C = {Zr>b;Zk y^a}. 

Thus, under the event A, the process Zk starts below b at T, and reaches a after multiple up-crossings 
of the threshold b. Under the event B, the process Zk starts above b at T, and crosses b before a. It then 
has multiple up-crossings of b, similar to the case of event A. Under event C, the process Zk starts above 
b at r, and reaches a without ever coming below b. 
Also define, 

A(x) = inf{A; > 1 : Zk ^ [b,a), Zq = x,b < x < a}, (21) 

and let A(x) be defined with Zq = x similar to ([T9] l. Thus, A and X{b) have the same distribution. 
Similarly, A and A(6) are identically distributed. 

The following theorem gives an asymptotic expression for the conditional delay. 

Theorem 3. For a fixed values of the thresholds a, b, the conditional delay is given by 



e[t - r|T > r] 



ADD* F{AuB\t > r) 
+ E[A(Zr)|C,r > r] P(C|t > T) 
+ E[t{Zr,b)\A,T>T] F{A\t>T) 
+ E[A(Zr)|B,r >r] P(^|r > F) 



(1 + 0(1)) as p^O. 



(22) 



Proof: The proof is provided in the appendix. 
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In Section |IV] we will provide approximations for various terms in (l22l ) to get an accurate estimate of 
ADD. In Lemma |2] we provide expressions for ADD'*. 

Let ^' represent the Shiryaev recursion, i.e., updating using only Define 

v{x,y) = \xii{k>l:m{Zk^i)>y, = x} . (23) 

Thus, v{x,y) is the time for the Shiryaev algorithm to reach y starting at x. Also, define the stopping 
times: 

vi, = v{b,a), (24) 

and 

fo = z^(— oo, a). (25) 

Note that, is the stopping time for the classical Shiryaev algorithm |[T] and v^, is its modified form 
which starts at b. We have the following asymptotic expression. 

Lemma 2. For a fixed b and p, ADD**, the average time for Zk to cross a starting at b, under Pi, with 
Zk reset to b each time it crosses b from below, is given by 

AJ3D^ = Ei[A] + Ei[^(ZA,6)|{ZA<fe}]Pi(ZA<b) ^^6) 

Pi(Za > a) 

and is asymptotically equal to the time taken by the Shiryaev algorithm to move from b to a, i.e., 

ADD' = Ei[z^b](l + o(l)) a-ya^oo. (27) 



Proof: We have 

ADD'* = El 



■ N 
.k=l 



Ei[iV]Ei[A] 
(n) El [A] 
Pi(^A > a) 

Ei[A] +Ei[t(ZA,fe)|{ZA < fc}]Pi(ZA < b) 
Vi{Zx>a) 

In the above equation, equality (i) follows from Wald's lemma |[T8l . and equality {ii) follows because 
N ~ Geom(P(ZA > a)). To obtain (l27l ). the main idea of the proof is to find stopping times which 

E [Al 

upper and lower bound the Shiryaev time on average and have delay equal to p^(^Zx>a) as a — > oo. The 
details are provided in the appendix. ■ 
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Note that Theorem |3] takes p ^ 0. We now provide another expression for E[r — r|T > F], for a fixed 
b and p as a — t- cx), which will be used to prove the asymptotic optimality of 7(0, b) in Section Ivl 



Theorem 4. For a fixed b and p, we have as a ^ 00 

E[r - r|r > r] < ADD" (1 + o(l)) 

and hence, we have 



e[t-t\t > r] 



Ll)(/i,/o) + |log(l-p)| 
where, Z)(/i,/o) is the K-L divergence between /o and fi. 



(1 + 0(1)) as a ^ 00, 



(28) 



(29) 



Proof: To get ( |28] ). we show that ADD" is the dominant term in an upper bound to E[t — r|T > T] 
as a — 00. The steps followed are very similar to those used to obtain (l22l ). The proof is given in the 
appendix. 

To obtain (l29l ). from Lemma |2] and (|28] ) we have, 

E[r - r|r > r] < Ei[i^b]{l + o(l)) as a ^ 00. 

To evaluate Ei [;/;,], following steps similar to those in Section Ull-Ai it is easy to show that evolution 
of Zk from b to a, with Zq = b, is according to the random walk ^^logL(Xfc) + | log(l — p)\ and a 
slowly changing term. Thus, according to Lemma 9.1.3, pg 191 of ifTSl . 



^(/i,/o) + |log(l-p)| 



and 



E[T-r|r > r] < 



(1 + 0(1)) as a ^> CX3, 



(1 + 0(1)) as a 00. 



Li?(/i,/o) + |log(l-p)|J 
To complete the proof of Theorem |4l we now show that E[r — r|r > F] is asymptotically lower 

bounded by Ei[z^b]. From Theorem 1 in ll20l . 

a 



E[i^o - r\uo > F] > 
Also, from Theorem |2] 



Z)(/l,/0) + |log(l-/9)| 

P[r < F] = P[z^o < r](l + 0(1)) as a ^ 00. 



1 + 0(1)) as a — >• 00. 



Thus, we have 



E[r - F|r > F] > E^ - F|i/o > F](l + o(l)) as a ^ 00. 

This is true because Shiryaev algorithm is optimal for problem ([T]) with /3 = cxd. This completes the 
proof. ■ 
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D. Computation o/ANO 
First note that, 

ANO = E 



E 



min{r,r-l} 



fc=i 



r-i 



E 



.k=l 

r-1 

.k=l 



r > r 



r > r 



P(r > r) +E 



.fc=l 



T < r 



P(r <r) 



'1 + 0(1)) as a — >■ 00. 



The last equaUty follows because X]fc=i 'S'fc < T on {r < F}, and P(r < E) < e — > as a — )• cx). 
Following (fTSl ). we define 



A = inf{A; > 1 : < 6, Zq = 6, a = 00}. 
The theorem below an gives asymptotic expression for ANO. 



(30) 



Theorem 5. For fixed b, we have as a 00, and as p ^ 0, 

Eoo[A] 1 



ANO 



where, X is as defined in (l30l) . 



Poo[r<A + t(Z^,6)]l + e^ 



Proof: Let t{b) be the first time crossed b from below, i.e., t{b) = t(zQ,b). Using the fact that 
observations are used only after t{b), we can write the following: 



ANO = E 



:E 



r-1 

.k=l 

r-1 

E 

k=t{b) 



r > F 



F > t{b),T > F 



P(F > tib)\T > F). 



(31) 



We now compute each of the two terms in (Bil l. For the first term in (I3TI ). we have the following lemma 



Lemma 3. For a fixed b, as a 00, p — ?■ 0, 

r-1 



E 



E 

k=t(h) 



r > t(6),T > F 



Eoo[A] 



Poo[r<A + t(Zc,6)] 



(1 + 0(1)). 



Proof: Note that 



lim E 

a— >cxD 



r-1 



E Sk 

k=t{b) 



F > t{b),T> F 



E 



r-1 



E 

k=t(h) 



F > t{b),a = 00 
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To compute the right hand side of the above equation, note that conditioned on {F > t{b)}, Yl]iZt{b) 
is approximately the number of observations used when the process starts at Zq = b, goes through 
multiple cycles below b, with each cycle length having distribution of A, and the sequence of cycles is 
interrupted by occurrence of change. See the appendix for the detailed proof. ■ 
For the second term in (OTT i. we show that P(r > t{b)\T > T) is equal to in the limit and is 
independent of zq. 



Lemma 4. 



p(r > t{b)\T > r) 



l + e' 



+ 0(1) as a — )■ 00, /o — )■ 0. 



Proof: The proof is provided in the appendix. 
The Lemmas |3] and |4] taken together completes the proof of Theorem |5] 

Define, 



ANOi = E 



.k=r 



T > r 



Thus, ANOi is the average number of observations used after the change point F. In some applications 
it might be of interest to have an estimate of ANOi as well. The following theorem shows that ANOi 
is approximately equal to the delay itself. 

Theorem 6. For fixed b and p, we have 

ANOi = El [7^5] (1 + 0(1)), as a ^00. 

Proof: The number of observations used after T can be written as the difference between the time 
for Zfc to reach a and the time spend by it below b. For this we define the variable 



.fc=r 



T > r 



Thus 



ANOi = E [r - r|r > r] - T;, + L 

We know from Theorem |4] that E[r — r|T > E] « Ei[z>f,]. As a — ?> cxo, converges, and therefore 
ANOi ~ El [i^b] for large a as well. The detailed proof is given in the appendix. ■ 



IV. Approximations and Numerical results 

In Sections IIII-BHIII-^ we have obtained asymptotic expressions for ADD, PFA, and AND as a 
function of the system parameters: the thresholds a, b, the densities /o and /i, and the prior p. We 
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now provide approximations for some of the analytical expressions obtained in these sections, and also 
provide numerical results to validate the analysis. The observations are assumed to be Gaussian with 
/o ~ M{0, 1), and fi M{6, 1), 9 > 0, for the simulations and analysis. In the simulations, the PFA 
values are computed using the expression E[l —pr]- This guarantees a faster convergence for small values 
of PFA. 



A. Numerical results for PFA 

By Theorem |2j we have the following approximation for PFA: 



PFA 



'dR{x). 



We note that e~^dR{x) and f can be computed numerically, at least for Gaussian observations |[T8l . 
In this section we provide numerical results to show the accuracy of the above expression for PFA. 

In Table |ll] we compare the analytical approximation with the PFA obtained using simulations of 
7(0, h) for various choices of p, thresholds a, b, and post change mean 6. From the table we see that the 
analytical approximation is quite good. 

TABLE II: PFA: for /o ^ 7\A(0, 1), /i ~ J\f{e, 1) 



e 


p 


a 


b 


PFA 

Simulations 


PFA 
Analysis 


0.4 


0.01 


3.0 





3.78x10"^ 


3.94x10"^ 


0.4 


0.01 


6.0 


2.0 


1.955x10"^ 


1.96x10"=' 


0.75 


0.01 


9.0 


-2.0 


7.968x10"^ 


7.964x10"^ 


2.0 


0.01 


5.0 


-4.0 


2.15x10"^ 


2.155x10"^ 


0.75 


0.005 


7.6 


3.0 


3.231x10"" 


3.235x10"* 


0.75 


0.1 


4.0 


-3.0 


1.143x10"^ 


1.157x10"^ 



In Table Hm we show that PFA is not a function of b for large values of a. We fix a = 4.6, and increase 
b from -2.2 to 0.85. We notice that PFA is unchanged in simulations when b is changed this way. This 
is also captured by the analysis and it is quite accurate. 

B. Approximations and numerical results for ANO and ANOi 

We recall the expressions for ANO from Theorem |5] and for ANOi from Theorem |6] 

Eoo[A] 1 



ANO 
ANOi 



Poo[r< A + t(Z^,6)]l + e^ 

MM- 
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TABLE III: PFA for p = 0.01, /o ~ A/'(0, 1), /i ~ A/'(0.75, 1) 



a 


b 


Simulations 


Analysis 


4.6 


-2.2 


6.44x10"-^ 


6.48 xlO~ 


3 


4.6 


-1.5 


6.44x10"-^ 


6.48x10" 


3 


4.6 


-0.85 


6.44X10"-' 


6.48x10" 


3 


4.6 





6.44xl0"-^ 


6.48x10" 


3 


4.6 


0.85 


6.44x10"^ 


6.48x10" 


3 



We first simplify tlie expression for ANO. Note that 

Poo[r < A + t(Z3^,6)] = 1-Poo[r > A + t(Z3^,6)] 

= l-Eoo[(l-p)^+*(^^'')]. 

Thus, using Binomial approximation we get 

Poo[r < A + b)] « p (Eoo[A] + Eoo[t(Zx' ^)]) • 

Thus, we have 

p-^ Eoo[A] 
Eoo[A]+Eoo[t(ZX'^)]^ + ^ 
We now provide approximation to compute Eoo[A] and ^ao[t{Z^,h)] in (l32l ). Invoking Wald's lemma 

ifTSl . we write Eoo[A] as, 

Eoo[A] Eoo[Z^]-Eoo[r?3^] 



ANO « „ „ • (32) 



-D(/i,/o) + |log(l-p)|- 
We have developed the following approximation for Eqo [A] : 

F \\^ ~ r + log(l + pe^'') 

D{h,h) - |log(l - ,0)1 

Here, log(l + pe""^) is an approximation to Eoo[??j;,] by ignoring all the random terms after b is factored 
out of it. This extra b will cancel with the b in Eoo[^j^] = b + ¥j^[Z^ — b]. We approximate Eoo[fe — 
by f, the mean overshoot of the random walk X]i=i ^fc' with mean D{fi, /o) — | log(l — p)\, when it 
crosses a large boundary (see (ITO^ ). 

For the term Eoo[i(-^j^, 6)], we have the following lemma. 



Lemma 5. For fixed values of x and y, we have 

- eV) - lo 
log(l-p)| 



y) = [ Ml + e^) -log(l + e-) ^ ^^^^^ ^ ^ ^_ ^3^^ 
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Proof: The proof is provided in the appendix. 
We use (l34l) to get the following approximation: 

log(l + e^) 



log(l + e"-^) 



dR{x). 



(35) 



' J, |log(l-p)| 
Thus, we approximate the distribution of (6 — Z^) by 

Based on the second order approximation for Ei[i/o] developed in |[20l . we have obtained the following 

approximation for Ei[f6]: 

a - E[r?(6)] + f 



+ oil) as a 



oo, 



^(/i,/o) + |log(l-p)| 

where, r]{b) is the a.s. limit of the slowly changing sequence ry„ with Zq = b under /i, (see (fTT]))and 



(36) 



xdR{x), (37) 

^0 

with R{x) as in Theorem [T] 

In Table |IV] we demonstrate the accuracy of approximations for ANO and ANOi, for various values 
of p, thresholds a, b, and post change mean 9. The table shows that the approximations are quite accurate 
for the parameters chosen. 



TABLE IV: /o ~ M{0, 1), /i ~ M{e, 1) 











ANO 


ANOi 


e 


P 


a 


6 


Simulations 


Analysis 


Simulations 


Analysis 


0.4 


0.01 


8.5 


-2.2 


66.3 


62.88 


102.9 


111.7 


0.75 


0.01 


6.467 


-2.2 


34.92 


34.24 


27.86 


29.46 


2.0 


0.01 


7.5 


-4.0 


42.94 


46.4 


6.08 


6.23 


0.75 


0.005 


8.7 


-3.0 


77.18 


75.09 


38.73 


40.38 


0.75 


0.1 


8.5 


0.0 


2.64 


3.2 


21.17 


22.18 



C. Approximations and numerical results for ADD 

Theorem m gave a first order approximation for E[t — rjr > F]: 

E[r - T\t > r] 



D(/i,/o) + |log(l-p)|. 

Note that, from ll20l . this is also the first order approximation for the ADD of the Shiryaev algorithm, 
and gives a good estimate of the delay when PFA is small. For the Shiryaev delay, a second order 
approximation was developed in ll20l (also see (l36l)): 

a — E[?y(— oo)] + f 



Ei[i/o] 



Z?(/i,/o) + jlog(l-p)jJ 



+ o(l) as a — )■ oo. 
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So, instead of using ^i^-j^ f^)^ \ iog(i-p) \ > propose to use the following: 

livr-l T>i — — — — TT . (38) 

L^(/i>/o) + |log(l 

For the Shiryaev algorithm, (l38T l provides a very good estimate of the delay even for moderate values 
of PFA. In case of 7(0, b), the accuracy of (l38T l depends on the choice of b and hence on the constraint 
/3, as having b > —00 increases the delay. Before we demonstrate this through numerical and simulation 
results we introduce the following concept: 

ANO% = ANO expressed as a percentage of E[r]. (39) 

For example, if p = 0.05, and for some choice of system parameters ANO = 15, then ANO% = 15 * 
0.05 = 75%. Thus, the concept of ANO% captures the reduction in the average number of observations 
used before change by employing 7(0, b). 

In Table rvl we provide various numerical examples where (l38l is a good approximation for E[r — r|T > 
r]. Since, (1381 ) is a good approximation for the Shiryaev delay as well, it follows that, for these parameter 
values, the delay of 7(0, b) is approximately equal to the Shiryaev delay. It might be intuitive that if we 
are aiming for large ANO% values of say 90%, then the delay will be close to the Shiryaev delay. But 
values in Table |V] shows that it is possible to achieve considerably smaller values of ANO% without 
significantly affecting the delay. 



TABLE V: /o - A/'(0, 1), h ~ ^{0, 1) 











ADD 


PFA 


ANO% 


e 


p 


a 


b 


Simulations 


Analysis 


Simulations 


Analysis 












E[r - rjr > r] 


(HI) 








0.4 


0.01 


8.5 


-2.2 


104.9 


111.7 


1.608x10"'' 


1.608x10"'' 


66% 


0.75 


0.01 


6.467 


-2.2 


32.3 


29.5 


1.002x10"^ 


1.004x10"^ 


35% 


2.0 


0.01 


7.5 


-4.0 


6.1 


6.23 


1.77X10"'' 


1.768x10"" 


43% 


0.75 


0.005 


8.7 


-3.0 


42.6 


40.4 


1.076x10"'' 


1.076x10"'' 


77% 


0.75 


0.1 


8.5 


0.0 


23.9 


22.18 


1.286x10""* 


1.285x10"" 


26% 



However, if the ANO% value is small, then this means that the value of b is large, and further that 
the delay is large. In this case, it might happen that ( [38] ) is a good approximation only for values of PFA 
which are very small. This is demonstrated in Table |Vl] It is clear from the table that, for the parameter 
values considered, estimating the delay with less than 10% error is only possible at PFA values of the 
order of PFA ^ lO^^s. 
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TABLE VI: p = 0.05, /o AA(0, 1), /i ^ A/'(0.75, 1) 



a 


h 


VI tniil Qti r»nc 
OiiiiUitlLIUlla 

E[r-r|r > r] 


A n cil\7ci c 
Icily aia 


ANO% 


PFA 


5.0 


1.0 


30 


13 


7.5% 


4.3 X 10"^ 


9.0 


1.0 


42 


25 


7.5% 


7.9 X 10"^ 


13.0 


1.0 


54 


37 


7.5% 


1.4 X 10"^ 


18.0 


1.0 


69 


52 


7.5% 


9.7 X 10"^ 


50.0 


1.0 


165 


149 


7.5% 


1.23 X 10-22 



This motivates the need for a more accurate estimate of the delay. This is provided below. 
From Theorem |3] recall that we had the following three events : 

A = {Zr<h}, 

B = {Zr>b;Zk Z'b}, 

C = {Zr>b;Zk /^a}. 

As a first step towards the approximations, we ignore the event B: P(^) ~ 0. That is, we assume that if 
Zy > b, then Zj^ climbs to a. Define, 

Pb = P{Zr>b\T>T). 

Then 

E[T-T\T>r]^Pb E[XiZr)\C, T >T] + {1 - Pb)iE[t{Zr,b)\A,T >r] + ABB'). (40) 
From Lemma |2l it is easy to show the following: 

ADD^ = Ei[A|{Za > a}] + {E,[X\{Z^ < b}] + Ei[t(Z,, < b}]) ^'^^^ < 



l-Pi{Zx<b) 
We now use the following approximations: 

a — E[7/(— oo)] + f 



Ei[A|{ZA>a}] « E[A(Zr)|C,r >r] 
Ei[A|{Za<6}] ; 



Z)(/i,/o) + |log(l-/9)|' 

f + log(l + pe~^) 



^(/i,/o)-|log(l-p)|' 

El i(ZA,6 {Za < 6} « t{b-r,b)^ — — . 

|log(l -p)| 
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To compute (l40l) . we also need approximations for Pi{Zx < h), Pi, and E[t(Zr, b)\A\. Those are provided 
below. Setting a = oo we have, by Wald's likelihood identity, Proposition 2.24, Pg 13, lITSl . 

Under Poo, A a.s. ends in b, and with high probability it takes very small values. Hence, this expressions 
can be computed using Monte Carlo simulations. Further, 

= P(r > t(-oo,6))P(Zr > 6|r > t(-cxD,6),r > r) 

Eoo[A] 

l + e'Eoo[A]+Eoo[t(^j„6)]' 
We already have the approximations for Eoo[A] and 'EaQ[t{Z^,b)] from Section [TV-BI The approximation 
for E[t(Zr,fe)|^] can be obtained as follows (all expectations conditioned on {r > F}): 

(1 - P6)E[i(Zr, 6)1^] = (1 - n)E[t(Zr, b)\{Zr < b}] 

= E[t{Zr, b)\{Zr <b}n{r> t{-oo, b)}]F{{T > t{-oo, b)} n {Zr < b}) 
+E[t{Zr,b)\{Zr < b} n {T < t(-oo, 6)}]P({r < i(-oo, 6)} n {Zr < b}). 

This can be computed using 

P({r>t(-oo,6)}n{Zr <6}) 



1 E^[tiZ~^,b)] 



l + e''E^[X]+E^[t{Z-^,b)] 
and 

P({r < t(-oo, b)} n {Zr < b}) = P({r < t(-oo, b)}) ^ 

To compute conditional expectation of t{Zr,b), we need to subtract from t{x,b), the mean of F condi- 
tioned on {r < t{x,b)}. Specifically, 

t{b-f,b) 

E[t(Zr, b)\{Zr < 6} n {P > t(-oo, b)}] = t{b - f, b) - ^ _ _^ - pf~' P, 

and, 

t{-oo,fe) 

E[t(Zr,6)|{Zr<6}n{r<t(-oo,6)}]=t(-oo,6)-p^^,^^p^^ J] k{l-pf-^p. 

Thus we have obtained approximations for all the terms for the new approximation for E[t — P|t > P] 
in dlOll. 

In Table IVIIi we now reproduce Table |Vl] with a new column containing delay estimates computed 
using the new ADD (for E[r — P|t > P]) approximation (|40] l. The values shows that all estimates are 
nearly within 10% of the actual value. 
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In Table IVIIIi we show the accuracy of the new ADD approximation (l40l ). for various values of the 
system parameters, by comparing it with simulations and also with (|38] ). We also set PFA around 1 x 10"'^. 
The table clearly demonstrates that the new ADD approximation predicts the ADD with less than 10% 
error. 



TABLE VII: p = 0.05, /o ~ A/'(0, 1), /i ~ A/'(0.75, 1) 



a 


h 


Simulations 

E[r-r|r>r] 


Analysis 


New Analysis 
ADD from gO) 


ANO% 


PFA 


5.0 


1.0 


30 


13 


34 


7.5% 


4.3 X 10"^ 


9.0 


1.0 


42 


25 


46 


7.5% 


7.9 X 10"^ 


13.0 


1.0 


54 


37 


58 


7.5% 


1.4 X 10"*^ 


18.0 


1.0 


69 


52 


73 


7.5% 


9.7 X lO"'-* 


50.0 


1.0 


165 


149 


169 


7.5% 


1.23 X 10"^^ 



TABLE VIII: /o - AA(0, 1), /i ~ AA(0.75, 1), PFA ^ lO^^, ANO=10% of Shiryaev ANO 









ADD 




p 


a 


6 


Simulations 


Analysis 
New lO 


Analysis 
HHl 


ANO% 


0.01 


6.4 


2.7 


250 


260 


14.42 


0.33% 


0.005 


6.45 


0.6 


181 


190 


22.09 


1.5% 


0.001 


6.47 


-2.7 


75 


80 


33.68 


7.6% 


0.0005 


6.47 


-3.49 


74 


79 


36.49 


8.4% 


0.0001 


6.47 


-5.2 


76 


80 


42.56 


9.6% 



V Asymptotic Optimality and performance of 7(0, h) 
A. Asymptotic Optimality 0/7(0,6) 

In Theorem |4] we saw that for a fixed b and p, 



E[T-r|r > r] 



[1 + 0(1)) as o — ;> 00. 



.^(/i,/o) + |log(l-p)L 
We recall that from EOl . this is also the asymptotic delay of the Shiryaev algorithm. 

Moreover, from Theorem |2] the PFA for 7(0,6) is 



PFA 



j e"^di?(x) j (1 + 0(1)) as a ^ 00. 
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Again from ll20l . this is the PFA for the Shiryaev algorithm. We thus have the following asymptotic 
optimality result for 7 (a, 5). 

Theorem 7. With j = {t, Si, . . . , Sr} define 

A(a, /3) = {7 : PFA(7) < q; AN0(7) < /?}, 

then for a fixed f3 and p, 



ADD(7(a(a,/3),6(a,/?))) 



inf ADD (7) 

7eA{o,/3) 



{1 + o{l)) as a ^ 0. (41) 



Here, for each a, (3, b{a, (3) is the smallest b such that AN0(7(a(a, /3), b{a, /?))) < f3 as a ^ 00. 

Proof: Fix b such that AN0(7(a, &)) < /? as a — > 00. It may happen that the constraint /3 is not met 
with equality. Then we choose the smallest b which satisfies the constraint /3 as a ^ 00. This choice of 
threshold b is unique for a given (3 because ANO is not a function of threshold a as a ^ 00. 

As a — )■ 00, the PFA and ADD both approach the Shiryaev PFA (fTSl ) and Shiryaev delay ( [29l ). 
respectively. Thus, as a — > 00, 7(0, b) is optimal over the class of all control poUcies A(a, (3) that satisfy 
the constraints a and (3. ■ 



B. Trade-off curves: Performance of 7(0, b) for a fixed and moderate a 

Theorem I2] shows that for small values of PFA, 7(0, b) is approximately optimal, i.e., it is not possible 
to outperform 7(0, b) by a significant margin. But for moderate values of PFA, it is not clear if their 
exists algorithms which can significantly outperform 7(0, b). Our aim is to partially address this issue in 
this section. 

In Fig. |4] we plot the ANO-ADD trade-off for the two-threshold algorithm. Specifically, we com- 
pare the two-threshold algorithm with the classical Shiryaev algorithm and study how much ANO 
can be reduced without significantly loosing in terms of ADD. For Fig. |4] we pick four values of 
p : 0.05,0.01,0.005,0.001. For a fixed p, we fix 6 = —00 and select threshold a such that the 
PFA(7(a, 6)) = 10^^. We then increase the threshold b to have ANO% values of 75%, 50%, 30%, 15%. 
We note that it was possible to reduce the ANO to 15% of E[r] by increasing the threshold b this way, 
without affecting the probabiUty of false alarm. Fig. |4] shows that we can reduce ANO by up to 25% 
while getting approximately the same ADD performance as that of the Shiryaev algorithm. Moreover, if 
we allow for a 10% increase in ADD compared to that of the Shiryaev algorithm, then we can reduce 
ANO by up to 70% (see plot for ANO% =30%). 
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fjj=N(0,l), f^=N(l,l), PFA=10 



40r 



35 - 



30 - 



25 - 



20 - 



I5L 



-•- 


AN0%=15% 


-V 


ANO%=30% 


-0 


ANO%=50% 


- 


ANO%=75% 




Shiryaev 




4 5 
|log(p) I — — 



Fig. 4: Trade-off curves comparing performance of two-threshold algorithm with the Shiryaev algorithm 
for ANO% of 75, 50, 30 and 15%. /o - M{0, 1), /i ~ Af{l, 1), and PFA = 10^^. 



Such a behavior was also observed in Table jV] where we saw that the delay for 7(0, b) is approximately 
equal to the Shiryaev delay for moderate to large ANO% values. Thus, for moderate PFA values, when 
the ANO% is moderate to large, 7(0, b) is approximately optimal. 

C. Comparison with fractional sampling 

In this section we compare the performance of 7(0, b) with the naive approach of fractional sampling, 
in which an ANO% of e% is achieved by employing Shiryaev algorithm and using a sample with 
probability e. Also, in fractional sampling, when a sample is skipped, the posterior probability pk is 
updated using Figure |5] compares the two schemes for ANO% of 50%. We also plot the performance 
of the Shiryaev algorithm for the same values of PFA and p. The figure shows that 7(0, b) helps in 
reducing the observation cost by a significant margin as compared to the fractional sampling scheme. 

From our approximations, we know that for large a 

ADD(7(a,b)) « " 

D[fiJo) + \^og{l-p)\ 

When the K-L distance -D(/i, /o) dominates the sum D{fi, /o) + | log(l — p)\, then we would expect that 
any scheme that ignores the past observations for observation control will perform poorly as compared 
to the one that relies on the state of the system to decide whether or not to take a sample in the next 
time slot. This is verified by the figure: as /) — 0, we see a significant difference in performances of 
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7(a, b) and the fractional sampling scheme. The figure also shows that as p becomes large, and begins 
to dominate the sum D{fi, /o) + | log(l — p)\, the ADD performance of the fractional sampling scheme 
approach that of the two-threshold algorithm j{a, b). 

60 
55 
50 
45 
A 40 
35 

ADD 

30 
25 
20 
15, 



Fig. 5: Trade-off curves comparing performance of the two-threshold algorithm with the Fractional 
Sampling Scheme for ANO% 50%. /o ~ Af{0, 1), /i ~ M{0.75, 1), and PFA = lO^^ 

VI. Conclusions 

We posed a data-efficient version of the classical Bayesian quickest change detection problem, where we 
control the number of observations taken before the change occurs. We obtained a two-threshold Bayesian 
algorithm that is asymptotically optimal, has good trade-off curves and is easy to design. We derived 
analytical approximations for the ADD, PFA and ANO performance of the two-threshold algorithm using 
which we can design the algorithm by choosing the thresholds. In particular, we showed that, when the 
constraint on the PFA is moderate to small and that on the ANO is not very small, the two-thresholds can 
be set independent of each other We also provided extensive numerical and simulation results that validate 
our analysis. Our results indicate that our two-threshold algorithm can significantly save on the number 
of observations taken before the change, while maintaining the delay relatively unchanged. A comparison 
with the naive approach of fractional sampling shows that the two-threshold algorithm is indeed very 
efficient in using observations to detect the change. Our two-threshold algorithm has many engineering 
applications in settings where an abrupt change has to be detected in a process under observation, but 
there is a cost associated with acquiring the data needed to make accurate decisions. 



ANO%=50% 




2.5 3 3.5 4 4.5 5 5.5 

I log (p) I > 
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An important problem for future research is to see if two-threshold policies are optimal in non-Bayesian 
(e.g., minimax) settings, where we do not have a prior on F. In particular, it is of interest to understand 
how to update the algorithm metric in a non-Bayesian setting when we skip an observation. From an 
application point of view, one can design a two-threshold algorithm based on the Shiryaev-Roberts or 
CUSUM approaches Il22l . and use the undershoot of the metric when it goes below the threshold '6', to 
design the off times. Furthermore, if we are able to find useful lower bounds on delay for given false 
alarm and ANO constraints, we may be able to use these to prove asymptotic optimality of such heuristic 
algorithms, as is done for the standard quickest change detection problem ||20l . ||23l . Also, such lower 
bounds can possibly help in obtaining insights for cases where the observations are not i.i.d. ll20l . ll23l . 
Other interesting problems in this area include the design of data-efficient optimal algorithms for robust 
change detection and nonparametric change detection. 



Appendix to SECTiON rill-AI 
Proof of Theorem U} We first show that rjn with b = —oo, and Zq a random variable, is a slowly 



changing sequence. Let Zq takes value zq, then 

n—l k 

Vn = log 

Define 



fo{X^ 



k=0 



\fliX^ 



^ log 



e^«+£p(l-p)'=n 



k=0 



r/(Zo) = log 



fe=0 i=l ■'^^ ^' 



Note that viZ^) as a function of Zq is well defined and finite under Pi. This is because by Jensen's 
inequality, for Zq = zq. 



E[r?(zo)] < log 
= log 



+5;p(i-p)% n 



fc=0 
oo 




Thus 



fe=0 

r/(Zo) = log (e^" + + ^ log (1 + p) 



b=—oo 



(42) 



fc=i 



This implies Xlfcli log (l + ^ "^'p) converges a.s. for i.i.d. {Xk} and b = — oo. This series will also 
converge with probability 1 if we condition on a set with positive probability. 

Let change happen at F = Z. We set Zq = Zr = Zi and assume that {X^}, /c > 1 have density /i, 
which would happen after F. We first show that starting with the above Zq, the sequence generated 
in (fTTI ) is slowly changing. 
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To verify the first condition ([T2] |. from (ITTi note that 

n~^max{|r7i|, . . . , |??„|} < 



n—l n 

log (e^" + p) I + J] log (1 + e-^v) + E (I ^{^'^<H 
k=l k=l 



Since, Zj^ — > oo a.s., log (l + e~^'"p) — > 0, also, II{^^<b} — >■ a.s. Thus both the sequences {log (l + e~^*' p) } 
and {{\\ogL{Xk)\)ls^Zk<b}} Cesaro summable and have Cesaro sum of zero. Thus the term inside 
the square bracket above, when divided by n, goes to zero a.s. and hence also in probability. Thus the 
first condition is verified. 

To verify the second condition ([T3] ). we first obtain a bound on \rin+k — iln\- 

n+fe— 1 n+k 

\r]n+k-Vn\< log (1 + e-^'p) + (Uog^(^0l)I{z.<6}- 

i=n i=n+l 

Thus, 

71+71(5—1 n+n5 
max |T/„+fc -r/„| < ^ log (l + ^-^'p) + J]] (| log L(X,)|) I{^^<fe} = + d^. 

i=n 7=n+l 

Here, for convenience of computation, we use dl^ and to represent the first and second partial sums 
respectively. Now, 

P{_, max |7?„+fc - -qnl > e} < P(4 + > e), 

and we bound the probabiUty P{dj^ + > e) as follows. 

On the event that E = {Z^ >b,\/k> 0}, is identically zero, thus for n large enough, 

P{di + dl>e\E)=P{di>e\E)<e. 

This is because behaves like a partial sum of a series of type in (|42] |. Since the series in (l42l ) 
converges if random variables are generated i.i.d. /i, it will also converge if conditioned on the event 
E. Thus, the partial sum dj^ converges to almost surely, and hence converges to in probability, i.e., 

P{dl > e\E) ^ 0. Select, n = such that V?z > P{dl^ > e\E) < e. 
Define 

Lz = snpik > 1 : < b, Z^ > b}, 

with Lz = oo if no such k exists. On the event £", which is the compliment of E, Lz is a.s. finite. 
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Then, by noting that = for Lz < n, we get for n large enough, 

P(4 + dl> e\E') = PE'idi + dl>e) < + dl > e; Lz > n) + Pi?'(4 + dl>e;Lz < n) 

< PE'{Lz>n) + FE'(.di + dl> e;Lz <n) 
= PE'{Lz>n) + PE'{di>e;Lz <n) 

< Pe'{Lz >n) + PE'idi > e\Lz < n) 

< e/2 + e/2 = e. 

Since, Lz is almost surely finite, Pe'{Lz > n) — )• as n ^ oo. Thus we can select n = such that 
V?i > 712, ^E'{Lz > n) < e/2. For the second term, note that conditioned on Lz < n, d\ behaves like 
a partial sum of a series of type in (|42] |. with Zq replaced by Zi^. Since the series in (l42l ) converges 
if random variables are generated i.i.d. /i beyond Lz, it will also converge if conditioned on the event 
{Lz < n}. Thus, the partial sum converges to almost surely, and hence converges to in probability, 
i.e., PE'idi > e\Lz < n) 0. Select, n = nl such that Vn > ng, P{di > e\Lz < n) < e/2. Then 
n* = max{nj, 712, ng}, is the desired n* and pick any 6 > 0. Then for n > n*, 

Pidi + dl>e) = P{di + dl>e\E)P{E)+P{di + dl>e\E')PiE') 
< eP{E) + eP{E') < e. 



Since the sequence rjn is slowly changing, according to 11181 . the asymptotic distribution of the overshoot 
when Zk crosses a large boundary under /i is R{x). Thus we have the following result, 

lim Pi [Zr — a < x\t > I] = R{x), 

a— >oo 

where Pi is the probability measure with change happening at /. Now, 

oo 

P[Zr-a<x\T>T] = Y^ Pi [Zr-a< x\t > I] P(r = 1\t > r), 

1=1 

and 

lim P; [Z^-a< x\t > I] P(r = 1\t >T) = R{x)P{r = /)<!. 

a— >oo 

Hence we have the desired result by dominated convergence theorem. ■ 

Appendix to Section flll-B I 

Proof of LemmaU} Since, Pr > A imply Zr > a, we have, 

1 1 

> 



1 + e-^- ~ 1 + e-" 
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The required result is obtained by obtaining upper and lower bounds on PFA as follows. 

1 



PFA = E[l-pr] = E 



1 + 



Also, 



PFA = E[l-pr] = E 



1 



1 + e^- 



= E 
> E 



1 1 

e^^ 1 + e~^^ 
1 1 



E [e~^-] (l + o(l)) as oo. 



Thus, 



_e^- 1 + €-"'_ 

PFA = E[e-^-](l + o(l)) = e-"E[e-(^--")](l + o(l)) as a ^ cxd. 



Now note that, 

^^-iz^-a)^ = E[e-(^--'')|r > r](l - P(t < F)) + E[e-(^--"V < r]P(r < F). 
Since, P(t < F) = E[l - Pr] < 1 - ^ < e~"-, we can write, 

PFA = e-"E[e-(^--")|T > r](l + o(l)) as a ^ oo. 
This proves the lemma. 



Appendix to Section flll-CI 
Proof of Theorem Each time Zk crosses b from below, is satisfies 



h < Zk < 6 + log 



1 



l-p 



+ log(l + e"V)- 



Define, hi = h + log + log(l + p). Then 6i — > 6 as /? ^ 0. Also, each time Zj. crosses h from 
below, the average time for to reach a can be decreased by setting = hi and increased by setting 
Zk = b. Let, N (Ni) be one plus the number of times Z^ goes below h before it crosses a, when it is 
reset to b (hi), each time it crosses h from below. 
Now recall the three disjoints events: 

A = {Zr<h}, 

B = {Zr>h;Zk ^b}, 

C = {Zr>h-Zk^a}. 

We can write, 

E[T-F|r>r] = E[t -T-A\t>T]+E[t -T;B\t>T]+E[t -V-C\t>T]. (43) 
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Now consider each of the three terms on the right hand side of the above equation. 

Under the event A, the process starts below b and reaches a after multiple up-crossings of the 
threshold b. Then, 



E[T-r-A\T>T]< E[i(Zr, b) \A, t>T] P(^|t > T) + Ei 



N 



.k=l 



p{a\t > r). 



(44) 



This upper bound was obtained by resetting to b each time it crosses b from below. Similarly, we can 
get a lower bound by setting = bi each time Z^ crosses b from below. Thus, 



E[T-r;A\T > r] > E[t{Zr,b)\A,T > T] F{A\t > T) + Ei 
Now by Wald's lemma HU, 
El 



,fc=i 



pM|t > r). 



.fc=l 



Ei[iVi]Ei[A(6i)] 
> Ei[iV]Ei[A(6)] = El 



N 



.k=l 



ADD' 



Thus, 



E[r-r;^|r > T] 



E[t{Zr,b)\A,T > r] P{A\t > T) 



(1 + 0(1)) as 0. 



+ ADD" P(^|r>r) 

Under the event B, the process Zi^ starts above b and crosses b before a. It then has multiple up-crossings 
of b, similar to the case of event A. Arguing in a similar manner, we get 



e[t-t-b\t > r] 



E[A(Zr)|S,T > r] P(Sjr>r) 



+ ADD'' P(^|r>r) 
Similarly, considering the event C, we get 



1 + 0(1)) as p^O. 



E[t-T;C\t > r] 



E[A(Zr)|C,T>r] P(C|T>r) 



[1 + 0(1)) as ,9^0. 

Substituting in (1431 ) we get the desired result (|22| |. ■ 
Proof of Lemma \2\ Based on "i>, we define two new recursions, one in which the evolution of Z^ 
is truncated at b, 

' ^{Zk) if ^{Zk) > b 
b if ^(Zfc) < b, 
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and, another in which the overshoot is ignored each time the Shiryaev recursion crosses b from below, 

b if Zk<b and ^(Z^) > b 

^(Zfc) otherwise . 
Based on these two recursions we define two new stopping times: 



h = inf{fc > 1 : ^(Zk-i) >a,Zo = b}, 
h = inf{k>l:^{Zk^i)>a,Zo = b}. 

These two stopping times stochastically upper and lower bound the Shiryaev stopping time vi, defined in 
(|24)). i.e.. 



Ei[Pfe] < Ei[zyfe] <Ei[z>fe]. (45) 

Recall from ^ that 

i/(x, y) = inf{k > 1 : > y,Zo = x}. 

Using Wald's lemma ifTSl . we can get the following expressions: 

F f,, 1 - El [A] E,[X]+EMZx,b);{Z,<b}] 
Fi{Zx>a) Fi{Zx>a) 

Multiplying and dividing ADD'' by Ei[A] we get 

Ei[\]+Ei[t{Zx,by,{Zx<b}] Ei[A] 



ADD' 



El [A] Pi(^A>a) 
,. M^]+EMZx,b)-{Zx<b}] 

= Ei[z>;,](l + o(l)) as a oo. 

The last equality follows because Ei[A] — > oo as a — )• oo, while Ei[t{Zx, b); {Zx < b}] is not a function 
of a. Similarly, multiplying and dividing ADD"* by Ei[A] + Ei[iy{Zx,b); {Zx < b}] we get 

ADD' =Ei[z>b](l + o(l)) as a ^ oo. 

Using these two expressions for ADD' and the relationship that Ei[p5] < Eiluj,] < Ei[i}h], we have, 

ADD' = Ei[t'fe](l + o(l)) as a oo. 

■ 

Proof of Theorem^ Consider the upper bound (|44] i: 

E[r-r;^|r > r] < E[t(Zr,6)|^,r > r] P(^|t > T) + ADD' P(^|t > T). 
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Similarly, the upper bounds corresponding to the other two events B and C are: 

E[t - T; B\t >T]< E[A{Zr)\B, t>T] F{B\t >T) + ADD' F{B\t > T) 

and, 

E[r-r;C|r>r] = E[A(Zr)|C, r > T] P(C|r > T) 

< Ei[A(6)|Za(,) >a] P(C|r>r) 

< ADD"* P(C|r > r). 

Substituting in (l43l) we get, 

E[r-r|r>r] = E[T-r;A\T>r]+E[T-r;B\T>r]+E[T-T;C\T>r]. 

< ABB' + E[t{Zr,b)\A,T>r]+E[A{Zr)\B,T>r]. (47) 

In equation (|47] ). we observe that except for ADD'^, other terms are not a function of threshold a. 
Thus we have 

E[r - rjr > r] < ADD' (1 + o(l)) as a ^ oo. 

■ 

Proof of Lemma\5l[ First note that by definition (|20] |. > 2/ > Also, from (|9]l 

^t(x,y) = ^t(x,y)-i + log + log(l + e-^*'-'"-^) 

< y + log-^+log(l + e-V)- 
1 - p 



Thus 



equivalently 



y < Zt{x,y) < y + log — h iog(i + e ^/)), 
1 — p 



1 -p 

Further, the recursion (|9) can be written in terms of e^*" for A: > 0: 



1-p 

Using this we can write an expression for e^'<=" 

7 , , e p + 1 



e 



e I \" ^ c -r 1 _ M _ \ 
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Using the bounds for Zi(^^^y^ obtained above, we get 



This gives us bounds for t{x, y): 



log(l + e^-p)-log(l + e-) ^ , , log^l + e^(l+£-y)-pJ-log(l + e-) 

|log(l-p)| ^^^'2/^^ |log(l-p)| • ^^'^ 

By keeping x, y fixed and taking p ^ we get (l34l ). ■ 



Appendix to Section nil-DI 
Proof of Lemma\3}[ Each time crosses b from below, is satisfies: 



b < Zk < 6 + log 



1 



1 



+ log(l + e"V). 



Define, bi = b + log + log(l + e~^p). Then 6i — > 6 as p ^ 0. Also, each time Zk crosses b from 
below, the average number of observations used before T can be increased by setting Zk = bi and 
decreased by setting Zk = b. This is because of the geometric nature of change. Let Zk = x when it 
crosses b from below, and suppose we reset Zk to bi. Then, the number of observations used before 
change, on an average, would be the number of observations used before Zk reaches x from bi, plus 
the number of observations used there onwards as if the process started at x. Similar reasoning can be 
given to explain why the average number of observations used decreases, if we reset Zk to b, each time 
it crosses b from below. 

Define the following stopping time: 

A"^ = inf{A; > 1 : Zk-i < b and Zk > b or k > T , Zq = x > b, a = oo}. 

Thus, A^ is the time for Zk, to start at Zq = x with a = oo, and stop the first time, either Zk approaches 
b from below, or when change happens. Also, let 6^ € (0, 1) be such that A^6^ is the number of 
observations used before Zk was stopped by A^, i.e., fraction of A^' when Zk > b. If {A^} and {A^^} 
be sequences with distribution of A* and A''^ respectively and if is the number of times Zk crosses 
b from below and is set to x at each such instant, then. 













< E 




k=l 





r-1 



< Ep 



k=t{b) 



r > t{b),a = oo 



k=l 
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Poc[r<A''i] 



. Also note that 



Here the equaUties follows from Wald's lemma lITSl . 
In the above, is Geom(Poo[r < A^]), and hence Eoo[L^'] 

Poo[r < A^^] ^ ^ 

— ^ 1 as yo — ^ 0. 

Poo[r<A^] 

Further, for x = bi or x = b, define \{x) based on (l30l ) as 

A(x) = inf{A; > 1 : Zk < b, Zq = x > b, a = oo}. 

It is clear that X{b) = X. Thus we have, for both x = bi and x = b, 

EooiA^-J"^] = EooiA'^^^'ir < A^'5^]Poo[r < A^'J^] + EooiA^^'^lr > A^5^]Poo[r > A^5^] 
Eoo[A(x)] as p ^- 0. 

Here, the result follows because as p — )• 0, converges a.s. to a finite limit and Poo[r < A^6^] 0. 
Also for the same reason, Poo[r > ^> 1 as p — > 0. Moreover, since 6i ^> 6 as p — > 0, we have as 

p^O 

Eoo[A(6i)] ^Eoo[A(6)] = Eoo[A]. 



Thus, 



E 



r-i 

k=t{h) 



r > t{b),a = oo 



Eoc[A] 



Poo[r< A^] 



(l + o(l)) asp^O. 



Proof of Lemma ^ Since P{r > P} — > 1 as a — )■ oo, 

P(r > t(6)|T > P) = P(r > + o(l) as a^oo 

1 



1 + ^0 



(1 - p)*(^) + o(l) as a ^ oo. 



From (l34l ) in Lemma |5l with y = b and x = zq, we have 

'log(H-e^) -log(l + e'^«; 



t{zQ,b) 

From this, it is easy to show that 



|log(l-p)| 



;i + o(l)) as p^O. 



(1-p) 



1 + 

1 + 



as p — )■ 0. 



By substituting this in the expression for P(P > t(6)|r > P) we get the desired result. 
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Proof of Theorem^ Using Theorem |4] we write ANOi as 

n-i 



ANOi = E [r - r|T > r] 1 



E[r - T\t > T] 



= M'^b] E[r-V|T>r] ) (^ + "(^)^ 

rp -I 

We now obtain an upper bound on p^ j ^^pj^^p j which goes to zero as a — > oo. 

Recall that A and B are the events under which excursions below b are possible. The passage to a is 
through multiple cycles below b, and the time spend below b in each cycle can be bounded by t(— oo, 6). 
Define A'^^ and Nq as one plus the number of cycles below b, under events A and B respectively. Then, 

Tb-l<Tb< Pi{A)t{-oo, b)E[NA] + Pi(S)t(-oo, 6)E[iVe]. 

The averages E[A^4] and E[A^g] can be written as a series of probabilities, where each term correspond 
to the event that goes below b, and not above a, each time it crosses b from below. Each of these 
probabilities can be maximized by setting to b, each time it crosses b from below. Hence, E[A'^] < 
E[N] and E[Ni3] <E[N]. This gives a bound on T;, - 1. 

Tfe - 1 < t{-oo,b)E[N]. 



By using ( 1451 ) we get as a ^ oo. 



E[T-r|r >r] - EiN ^ ' - Ei[h] 
From (|46] | we know that Ei[Pb] = Ei[A]E[A^]. Thus the upper bound on p.j^^p"j"^^pj goes to as a — > cxd. 
This proves the theorem. ■ 
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